nlp researcher
NLP Meets the World: Toward Improving Conversations With the Public About Natural Language Processing Research
Recent developments in large language models (LLMs) have been accompanied by rapidly growing public interest in natural language processing (NLP). This attention is reflected by major news venues, which sometimes invite NLP researchers to share their knowledge and views with a wide audience. Recognizing the opportunities of the present, for both the research field and for individual researchers, this paper shares recommendations for communicating with a general audience about the capabilities and limitations of NLP. These recommendations cover three themes: vague terminology as an obstacle to public understanding, unreasonable expectations as obstacles to sustainable growth, and ethical failures as obstacles to continued support. Published NLP research and popular news coverage are cited to illustrate these themes with examples. The recommendations promote effective, transparent communication with the general public about NLP, in order to strengthen public understanding and encourage support for research.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- Europe > United Kingdom (0.04)
- Media (0.69)
- Education > Curriculum > Subject-Specific Education (0.68)
- Government > Regional Government (0.47)
- Health & Medicine > Therapeutic Area (0.46)
Speciesism in Natural Language Processing Research
Takeshita, Masashi, Rzepka, Rafal
Natural Language Processing (NLP) research on AI Safety and social bias in AI has focused on safety for humans and social bias against human minorities. However, some AI ethicists have argued that the moral significance of nonhuman animals has been ignored in AI research. Therefore, the purpose of this study is to investigate whether there is speciesism, i.e., discrimination against nonhuman animals, in NLP research. First, we explain why nonhuman animals are relevant in NLP research. Next, we survey the findings of existing research on speciesism in NLP researchers, data, and models and further investigate this problem in this study. The findings of this study suggest that speciesism exists within researchers, data, and models, respectively. Specifically, our survey and experiments show that (a) among NLP researchers, even those who study social bias in AI, do not recognize speciesism or speciesist bias; (b) among NLP data, speciesist bias is inherent in the data annotated in the datasets used to evaluate NLP models; (c) OpenAI GPTs, recent NLP models, exhibit speciesist bias by default. Finally, we discuss how we can reduce speciesism in NLP research.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > Republic of Türkiye (0.04)
- (18 more...)
- Law (0.46)
- Government > Regional Government (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.89)
What is the social benefit of hate speech detection research? A Systematic Review
While NLP research into hate speech detection has grown exponentially in the last three decades, there has been minimal uptake or engagement from policy makers and non-profit organisations. We argue the absence of ethical frameworks have contributed to this rift between current practice and best practice. By adopting appropriate ethical frameworks, NLP researchers may enable the social impact potential of hate speech research. This position paper is informed by reviewing forty-eight hate speech detection systems associated with thirty-seven publications from different venues.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- Oceania > New Zealand (0.05)
- (22 more...)
- Law (1.00)
- Social Sector (0.68)
- Government (0.66)
- (2 more...)
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Si, Chenglei, Yang, Diyi, Hashimoto, Tatsunori
Recent advancements in large language models (LLMs) have sparked optimism about their potential to accelerate scientific discovery, with a growing number of works proposing research agents that autonomously generate and validate new ideas. Despite this, no evaluations have shown that LLM systems can take the very first step of producing novel, expert-level ideas, let alone perform the entire research process. We address this by establishing an experimental design that evaluates research idea generation while controlling for confounders and performs the first head-to-head comparison between expert NLP researchers and an LLM ideation agent. By recruiting over 100 NLP researchers to write novel ideas and blind reviews of both LLM and human ideas, we obtain the first statistically significant conclusion on current LLM capabilities for research ideation: we find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas while being judged slightly weaker on feasibility. Studying our agent baselines closely, we identify open problems in building and evaluating research agents, including failures of LLM self-evaluation and their lack of diversity in generation. Finally, we acknowledge that human judgements of novelty can be difficult, even by experts, and propose an end-to-end study design which recruits researchers to execute these ideas into full projects, enabling us to study whether these novelty and feasibility judgements result in meaningful differences in research outcome.
On the Origins of Bias in NLP through the Lens of the Jim Code
Elsafoury, Fatma, Abercrombie, Gavin
In this paper, we trace the biases in current natural language processing (NLP) models back to their origins in racism, sexism, and homophobia over the last 500 years. We review literature from critical race theory, gender studies, data ethics, and digital humanities studies, and summarize the origins of bias in NLP models from these social science perspective. We show how the causes of the biases in the NLP pipeline are rooted in social issues. Finally, we argue that the only way to fix the bias and unfairness in NLP is by addressing the social problems that caused them in the first place and by incorporating social sciences and social scientists in efforts to mitigate bias in NLP models. We provide actionable recommendations for the NLP research community to do so.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (12 more...)
- Law > Civil Rights & Constitutional Law (1.00)
- Government (1.00)
- Information Technology (0.93)
Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources
Yu, Xinyan Velocity, Asai, Akari, Chatterjee, Trina, Hu, Junjie, Choi, Eunsol
While the NLP community is generally aware of resource disparities among languages, we lack research that quantifies the extent and types of such disparity. Prior surveys estimating the availability of resources based on the number of datasets can be misleading as dataset quality varies: many datasets are automatically induced or translated from English data. To provide a more comprehensive picture of language resources, we examine the characteristics of 156 publicly available NLP datasets. We manually annotate how they are created, including input text and label sources and tools used to build them, and what they study, tasks they address and motivations for their creation. After quantifying the qualitative NLP resource gap across languages, we discuss how to improve data collection in low-resource languages. We survey language-proficient NLP researchers and crowd workers per language, finding that their estimated availability correlates with dataset availability. Through crowdsourcing experiments, we identify strategies for collecting high-quality multilingual data on the Mechanical Turk platform. We conclude by making macro and micro-level suggestions to the NLP community and individual researchers for future multilingual data development.
- Africa (0.14)
- North America > United States > District of Columbia > Washington (0.05)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (22 more...)
What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
Michael, Julian, Holtzman, Ari, Parrish, Alicia, Mueller, Aaron, Wang, Alex, Chen, Angelica, Madaan, Divyam, Nangia, Nikita, Pang, Richard Yuanzhe, Phang, Jason, Bowman, Samuel R.
We present the results of the NLP Community Metasurvey. Run from May to June 2022, the survey elicited opinions on controversial issues, including industry influence in the field, concerns about AGI, and ethics. Our results put concrete numbers to several controversies: For example, respondents are split almost exactly in half on questions about the importance of artificial general intelligence, whether language models understand language, and the necessity of linguistic structure and inductive bias for solving NLP problems. In addition, the survey posed meta-questions, asking respondents to predict the distribution of survey responses. This allows us not only to gain insight on the spectrum of beliefs held by NLP researchers, but also to uncover false sociological beliefs where the community's predictions don't match reality. We find such mismatches on a wide range of issues. Among other results, the community greatly overestimates its own belief in the usefulness of benchmarks and the potential for scaling to solve real-world problems, while underestimating its own belief in the importance of linguistic structure, inductive bias, and interdisciplinary science.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Germany (0.04)
- Asia > Middle East > Israel (0.04)
- (15 more...)
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.66)
- Information Technology > Security & Privacy (0.67)
- Law > Statutes (0.45)
NLP Researcher - FinTech Science Me Up
Our client is a Fintech who was founded on the principle that there is too much complex data online with an insufficient amount of support to help private and professional investors make sense of it. Their solutions allow sifting and crunching data into the simple, concise, and actionable insights needed to succeed in today's markets. Inside their R&D team, they are looking for an NLP Researcher for adding new features and languages to their products. Their solutions used data from multiples sources (financial websites, news, social networks, etc.). Collaborate with the team to create and develop machine learning, deep learning, and NLP models.
The field of natural language processing is chasing the wrong goal
At a typical annual meeting of the Association for Computational Linguistics (ACL), the program is a parade of titles like "A Structured Variational Autoencoder for Contextual Morphological Inflection." At this year's conference in July, though, something felt different--and it wasn't just the virtual format. Attendees' conversations were unusually introspective about the core methods and objectives of natural-language processing (NLP), the branch of AI focused on creating systems that analyze or generate human language. Papers in this year's new "Theme" track asked questions like: Are current methods really enough to achieve the field's ultimate goals? What even are those goals? My colleagues and I at Elemental Cognition, an AI research firm based in Connecticut and New York, see the angst as justified.
- North America > United States > New York (0.25)
- North America > United States > Connecticut (0.25)
Best Books to Expand Your NLP Knowledge
The abundance of knowledge and resources can be at times overwhelming specifically when you are talking about new age technologies like Natural Language Processing or what we popularly call it as NLP. When trying to educate yourself, you should always choose resources with solid base and fresh books to impart unprecedented package of learnings. Here is the list of top books that can help you expand your NLP knowledge. One of the most widely referenced and recommended NLP books, written by Stanford University professor Dan Jurafsky and University of Colorado professor James Martin, provides a deep-dive guide on the subject of language processing. It's intended to accompany undergraduate or advanced graduate courses in Natural Language Processing or Computational Linguistics. However, it's a must-read for anyone diving into the theory and application of language processing as they grow and strengthen their analytics capabilities.